-
Notifications
You must be signed in to change notification settings - Fork 1k
Add support for op_block_list #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Very useful! Thanks! Just for my testing, is the model you're testing available on the Hugging Face hub? |
|
@xenova I have the exported model available here https://huggingface.co/pdufour/Qwen2-VL-2B-Instruct-ONNX-Q4-F16 but I haven't uploaded the source files. It might be easier to try on a smaller example. I've updated the description of the PR if you want to try that one. |
|
One curious behaviour is that if you do provide a op_block_list it doesn't include the defaults anymore https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L141. I am not sure if you want that or not, could also include the defaults if that's preferred. But then it's impossible to clear them. |
Good point! We should then default to |
|
@xenova Updated PR to use None and added some more comprehensive tests in the description. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| blocked_ops = set(float16.DEFAULT_OP_BLOCK_LIST) | ||
| if op_block_list is not None: | ||
| blocked_ops.update(op_block_list) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor limitation of this updated approach is that you can't choose to quantize a node which is in the default block list. Most of those ops are chosen since there aren't fp16 variants of those ops, so I don't think this is an issue.
TLDR: Can only add to block list.
xenova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Background
Added a new argument to the quantize script called "op_block_list." If op_block_list is provided, do not quantize those ops. Sometimes you have ops that are incompatible with quantization.
Test Plan
Regession Test
op_block_listgit clone https://huggingface.co/onnx-models/sentence-t5-base-onnx/model/model.0/auto_model/encoder/block.0/layer.0/SelfAttention/Rangenode so we are checking that it is still excluded because it is part of the default exclude types (https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L108)mainbranch of transformers.jsgit checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16stat -f "%z" model_fp16.onnxgit checkout . && rm -rf ./*_*.onnx || true && PYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder . --output_folder . --mode fp16stat -f "%z" model_fp16.onnxQwen2-VL Test
op_block_listworksrm -rf onnx/*_*_*.onnxPYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx-dest --mode q4f16python3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnxonnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ./onnx/QwenVL_A_q4f16.onnx failed:Type Error: Type parameter (T) of Optype (Sub) bound to different types (tensor(float) and tensor(float16) in node (/Sub).rm -rf onnx/*_*_*.onnxPYTHONPATH=../transformers.js ../transformers.js/.venv/bin/python3 -m scripts.quantize --input_folder ./onnx --output_folder ./onnx --mode q4f16 --op_block_list Conv DynamicQuantizeLinear DequantizeLinear Resizepython3 infer.py Qwen/Qwen2-VL-2B-Instruct ./onnxThe image shows a vintage teal-colored...